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Abstract — The "water-fllling" solution for the quadratic rate- 
distortion function of a stationary Gaussian source is given in 
terms of its power spectrum. This formula naturally lends itself to 
a frequency domain "test-channel" realization. We provide an al- 
ternative time-domain realization for the rate-distortion function, 
based on linear prediction. The predictive test-channel has some 
interesting implications, including the optimality at all distortion 
levels of pre/post filtered vector-quantized differential pulse code 
modulation (DPCM), and a duality relationship with decision- 
feedback equalization (DFE) for inter-symbol interference (ISI) 
channels. 

Keywords: Test channel, water-filling, pre/post-filtering, 
DPCM, Shannon lower bound, ECDQ, directed-information, 
equalization, MMSE estimation, decision feedback. 

I. Introduction 

The water-filling solution for the quadratic rate-distortion 
function R(D) of a stationary Gaussian source is given in 
terms of the spectrum of the source. Similarly, the capacity 
C of a power-constrained ISI channel with Gaussian noise is 
given by a water-filling solution relative to the effective noise 
spectrum. Both these formulas amount to limiting values of 
mutual-information between vectors in the frequency domain. 
In contrast, linear prediction along the time domain can 
translate these vector mutual-information quantities into scalar 
ones. Indeed, for capacity, Cioffi et al [4] showed that C is 
equal to the scalar mutual-information over a slicer embedded 
in a decision-feedback noise-prediction loop. 

We show that a parallel result holds for the rate-distortion 
function: R(D) is equal to the scalar mutual-information over 
an additive white Gaussian noise (AWGN) channel embedded 
in a source prediction loop, as shown in Figure Q] This 
result implies that R(D) can essentially be realized in a 
sequential manner (as will be clarified later), and it joins other 
observations regarding the role of minimum mean-square error 
(MMSE) estimation in successive encoding and decoding of 
Gaussian channels and sources [7], [6], [3]. 

The Quadratic-Gaussian Rate-Distortion Function 

The rate-distortion function (RDF) of a stationary source 
with memory is given as a limit of normalized mutual infor- 
mation associated with vectors of source samples. For a real 
valued source {X n } = . . . , X-2, -^-1,-^0, ^1,-^2, ■ ■ ■, and 
expected mean-squared distortion level D, the RDF can be 
written as, [2], 



R{D) = lim -inf I(X U 



, X n ; Y\ , 



,Y n ) 
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where the infimum is over all channels X — ► Y such that 
— ||Y — X|| 2 < D. A channel which realizes this infimum is 
called an optimum test-channel. When the source is zero-mean 
Gaussian, the RDF takes an explicit form in the frequency 
domain in terms of the power-spectrum 



S(e ,2,/ )=VfiHe- w , 



k 



l/2< /< 1/2, 



where R[k] — E{X n X n+ k} is the auto-correlation function of 
the source. The water filling solution, illustrated in Figure 12 
gives a parametric formula for the Gaussian RDF in terms of 
a parameter 9 [8], [2], [5]: 
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where the distortion spectrum is given by 
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In the special case of a memoryless (white) Gaussian source 
./V(0, a 2 ), the power-spectrum is flat S(. 
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9 = D and the RDF is simplified to 

,2* 

0<D<a 2 . 



— log ( — ■ 
2 & \ D 



so 



(4) 



The optimum test-channel can be written in this case in a 
backward additive-noise form: X = Y + N, with N ~ 
7V(0, D), or in a. forward linear additive-noise form: 

Y = (i{aX + N) 



with a = (3 = y/l - D/a 2 and N - JV(0, D). In the general 
stationary case, the forward channel realization of the Gaussian 
RDF has several equivalent forms [8, Sec. 9.7], [2, Sec. 4.5]. 
The one which is more useful for our purpose replaces a and f3 
above by linear time-invariant filters, while keeping the noise 
N as AWGN [18]: 



Y n = h 2 . n * {h Xjn * X n + N n ) 



(5) 



where N n ~ N(0, 9) is AWGN with 9 = 9(D) = the water 
level, * denotes convolution, and h\, n and h,2, n are the impulse 
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Fig. 1. Predictive Test Channel. 




o o o 
o o o o 

O D(ei 27r f ) -—/ O O O O 

oooooo oooo 



-ir 



Fig. 2. The water filling solution. 



responses of a suitable pre-filter and post-filter, respectively. 
See (fl3])-(fT8]) in the next section. 

If we take a discrete approximation of (Q3, 



y l fSie^)_ 



(6) 



then each component has the memoryless form of (0J. Hence, 
we can think of the frequency domain formula ([TJ as an encod- 
ing of parallel (independent) Gaussian sources, where source 
i is a memoryless Gaussian source X^ ~ N(0, S(e^ 2 ' K -' i )) 
encoded at distortion level D(e^ 2 ^^ ); see [5]. Indeed, practical 
frequency domain source coding schemes such as Transform 
Coding and Sub-band Coding [10] get close to the RDF of a 
stationary Gaussian source using an "array" of parallel scalar 
quantizers. 



Rate-Distortion and Prediction 

Our main result is a predictive channel realization for the 
quadratic-Gaussian RDF ([TJ, which can be viewed as the 
time-domain counterpart of the frequency domain formulation 
above. The notions of entropy-power and Shannon lower 
bound (SLB) provide a simple relation between the Gaussian 
RDF and prediction, and motivate our result. Recall that the 
entropy-power is the variance of a white Gaussian process 
having the same entropy-rate as the source [5]; for a zero-mean 
Gaussian source with power-spectrum S(e : ' 27r ^), the entropy- 



power is given by 



P e (X)=exp(f ' i 



log (S(e^)) df 



(7) 



In the context of Wiener's spectral-factorization theory, 
the entropy-power quantifies the MMSE in one-step linear 



prediction of a Gaussian source from its infinite past [2]: 

P e (X) = inf E\X n -Y ai X n -i ) . 
l a *} \ ~1 



(8) 



The error process associated with the infinite-order optimum 
predictor, 



z n — x n — y ^ aiX n 



(9) 



is called the innovation process. The orthogonality principle 
of MMSE estimation implies that the innovation process has 
zero mean and is white; in the Gaussian case un-correlation 
implies independence, so 



Z n ~AT{0,P e {X)) 



(10) 



is a memoryless process. See, e.g., [7]. 

^From an information theoretic perspective, the entropy- 
power plays a role in the SLB: 



R(D) >ilog 



D 



(11) 



Equality in the SLB holds if the distortion level is smaller 
than or equal to the lowest value of the power spectrum: D < 
S mm = min / S , (e j27r/ ), in which case D{e^f) = 9 = D 
[2]. It follows that for distortion levels below 5 m ; n the RDF 
of a Gaussian source with memory is equal to the RDF of its 
memoryless innovation process Z n : 



R{D) = R Z {D) 



1 



log 



D 



D< Sr, 



(12) 



where a\ = P e (X). 

We shall see later in Section II how identity ( fT2l translates 
into a predictive test-channel, which can realize the RDF not 
only for small but for all distortion levels. This test channel 
is motivated by the sequential structure of Differential Pulse 
Code Modulation (DPCM) [12], [10]. The goal of DPCM is 
to translate the encoding of dependent source samples into 
a series of independent encodings. The task of removing the 
time dependence is achieved by (linear) prediction: at each 
time instant the incoming source sample is predicted from 
previously encoded samples, the prediction error is encoded 
by a scalar quantizer and added to the predicted value to form 
the new reconstruction. See Figure [3] 
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Fig. 3. DPCM Quantiztion Scheme. 



A negative result along this direction was recently given 
by Kim and Berger [13]. They showed that the RDF of an 
auto-regressive (AR) Gaussian process cannot be achieved by 
directly encoding its innovation process. This can be viewed 
as open-loop prediction, because the innovation process is 
extracted from the clean source rather than from the quantized 
source [12], [9]. Here we give a positive result, showing that 
the RDF can be achieved if we embed the quantizer inside 
the prediction loop, i.e., by closed-loop prediction as done 
in DPCM. The RDF-achieving system consists of pre- and 
post-filters, and an AWGN channel embedded in a source 
prediction loop. As we show, the scalar (un-conditioned) 
mutual information over this inner AWGN channel is equal 
to the RDF. 

After presenting and proving our main result in Sec- 
tions II and III, respectively, we discuss its characteristics 
and operational implications. Section IV discusses the spectral 
features of the solution. Section V relates the solution to 
vector-quantized DPCM of parallel sources. Section VI shows 
an implementation by Entropy Coded Dithered Quantization 
(ECDQ), while extending the ECDQ rate formula [16] to the 
case of a system with feedback. Finally, in Section VII we 
relate prediction in source coding to prediction for channel 
equalization and to recent observations by Forney [7]. As in 
[7], our analysis is based on the properties of information 
measures; the only result we need from Wiener's estimation 
theory is the orthogonality principle. 



II. Main Result 

Consider the system in Figure [T] which consists of three 
basic blocks: a pre-filter H\{e : > 2 ' K *), a noisy channel embedded 
in a closed loop, and a post-filter i?2(e- y27r -' ), where 7J(e : ' 27r ^) 
denotes the frequency response of a filter with impulse re- 
sponse h n , 



H(e^f) = J2 h - 



-jn2irf 



-1/2 < / < 1/2. 



The system parameters are derived from the water-filling 
solution (d}-©, and depend on the source spectrum S^e- 7 ' 2 "'') 
and the distortion level D. The source samples {X n } are 
passed through a pre-filter, whose phase is arbitrary and its 
absolute squared frequency response is given by 



J'2tt/\|2 
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where § is taken as 1. The pre-filter output, denoted U n , is fed 
to the central block which generates a process V n according 
to the following recursion equations: 

,Vn-L) (14) 

(15) 
(16) 
(17) 

where N n ~ jV(0, 8) is a zero-mean white Gaussian noise, 
independent of the input process {Un}, whose variance is 
equal to the water level 8 = 8(D); and <?(•) is some prediction 
function for the input U n given the L past samples of the 
output process (V n -i,V n -2, ■ ■ ■ , V n -L)- U Finally, the post- 
filter frequency response is the complex conjugate of the 
frequency response of the pre-filter, 

H 2 {e j2 * f ) = H*{ei 2nf ). (18) 

Equivalently, the impulse response of the post-filter is the 
reflection of the impulse response of the pre-filter: 



<J2,n — hi - 



(19) 



See a comment regarding causality in the end of the section. 
The block from U n to V n is equivalent to the configuration 
of DPCM, [12], [10], with the DPCM quantizer replaced by 
the additive Gaussian noise channel Zq„ = Z„ + N n . In 
particular, the recursion equations (fl4b-([T7b imply that this 
block satisfies the well known "DPCM error identity", [12], 



V n = U n + {Zq n -Z n ) = U n + N n . 



(20) 



That is, the output V n is a noisy version of the input U n 
via the AWGN channel V n = U n + N n . Thus, the system 
of Figure Q] is equivalent to the system depicted in Figure |4] 
which corresponds to the forward channel realization (0 of 
the quadratic-Gaussian RDF. 

In DPCM the prediction function g is linear: 



9 



(Vn-l-,. ■■ ,V n - L ) = J~]aiVn 



(21) 



where oi, . . . , Ox are chosen to minimize the mean-squared 
prediction error: 

2 



erf, = min.B N7 n - V" a i V n - l 



(22) 



Because V n is the result of passing U n through an AWGN 
channel, we call that "noisy prediction". If {U n } and {V n } are 
jointly Gaussian, then the best predictor of any order is linear, 

'No initial condition on V n is needed as we assume a two-sided input 
process X n , and the system is stable. 



so a\ is also the MMSE in estimating U n from the vector 
(V n -i, . . . , V n -L)- Clearly, this MMSE is non-increasing with 
the prediction order L, and as L goes to infinity it converges 
to 

(23) 



cJL = r lim °i, 



the optimum infinite order prediction error in U n given the 
past 



V-={V„_i,K-2,...}. 



(24) 



We shall see later in Section HVl that a 2 ^ = P e (V) - 9. We 
further elaborate on the relationship with DPCM in Section FVl 
We now state our main result. 



Theorem 1: (Predictive test channel) For any stationary 
source with power spectrum Sie? 21 *') and distortion level D, 
the system of Figure Q] with the pre-filter ( D~3l > and the post- 
filter ([T8| >, satisfies 



E(Y n - X n f = D. 



(25) 



Furthermore, if the source X n is Gaussian and g — g(V~) 
achieves the optimum infinite order prediction error a 2 ^ d23l , 
then 

1 a 2 

I(Z n] Z n + N n ) = - log(l + -f- ) = R(D), (26) 

where the left hand side is the scalar mutual information over 
the channel (TToT l. 



The proof is given in Section |TTT] The result above is in 
sharp contrast to the classical realization of the RDF (0, 
which involves mutual information rate over a test-channel 
with memory. In a sense, the core of the encoding process in 
the system of Figure Q] amounts to a memoryless AWGN test- 
channel (although, as we discuss in the sequel, the channel 
([Tol l is not quite memoryless nor additive). ^From a practical 
perspective, this system provides a bridge between DPCM and 
rate-distortion theory for a general distortion level D > 0. 

Another interesting feature of the system is the relationship 
between the prediction error process Z n and the original 
process X n . If X n is an auto-regressive (AR) process, then 
in the limit of small distortion (D — <> 0), Z n is roughly its 
innovation process ©. Hence, unlike in open-loop prediction 
[13], encoding the innovations in a closed-loop system is 
optimal in the limit of high-resolution encoding. We shall 
return to this point, as well as discuss the case of general 
resolution, in Section HVl 

Finally, we note that while the central block of the system 
is sequential and hence causal, the pre- and post-filters are 
non-causal and therefore their realization in practice requires 
delay. Specifically, since by ([T9| > hi, n = hi- n , if one of the 
filters is causal then the other must be anti-causal. Often the 
filter's response is infinite, hence the required delay is infinite 
as well. Of course, one can approximate the desired spectrum 
(in L2 sense and hence also in rate-distortion sense) to any 
degree using filters of sufficiently large but finite delay 5, so 
the system distortion is actually measured between Y n and 



X n -g. In this sense, Theorem Q] holds in general in the limit 
as the system delay S goes to infinity. 

If we insist on a system with causal reconstruction (5 = 
0), then we cannot realize the pre- and post-filters ( fT3l l and 
(I181 l. and some loss in performance must be paid. Nevertheless, 
if the source spectrum is bounded from below by a positive 
constant, then it can be seen from ( fT3l that in the limit of small 
distortion (D — > 0) the filters can be omitted, i.e., Hi = H2 = 
1 for all /. Hence, a causal system (the central block in Figure 
[TJ is asymptotically optimal at "high resolution" conditions. 
Furthermore, the redundancy of an AWGN channel above the 
RDF is at most 0.5 bit per source sample for any source and at 
any resolution; see, e.g., [16]. It thus follows from Lemma Q] 
below (which directly characterizes the information rate of the 
central block of Figure Q}, that a causal system (the system 
of Figure Q] without the filters) loses at most 0.5 bit at any 
resolution. 

These observations shed some light on the "cost of causal- 
ity" in encoding stationary Gaussian sources [14]. It is an open 
question, though, whether a redundancy better than 0.5 bit can 
be guaranteed when using causal pre and post filters in the 
system of Figure Q] 



III. Proof of Main Result 

We start with Lemma Q] below, which shows an identity 
between the mutual information rate over the central block of 
Figure [T]and the scalar mutual information (1261 1. This identity 
holds regardless of the pre- and post-filters, and only assumes 
optimum infinite order prediction in the feedback loop. 

Let 

I{{U n }; {V n }) = lim -I(U l ,...,U n ;V 1 ,...,V n ) (27) 

n — >oc Ti 

denote mutual information-rate between jointly stationary 
sources {U n } and {V n }, whenever the limit exists. 



Lemma 1: For any stationary Gaussian process {U n } in 
Figure Q] if U n is the optimum infinite order predictor of U n 
from V~ (so the variance of Z n is a 2 ^ as defined in (f23l). 
then 

I({U n }; {V n }) - I(Z n ; Z n + N n ). (28) 



Proof: For any finite order predictor g(V™_£) we can 



write 



WnYMvtl) = i({u n }, Ui-u^-M-u^vizt) 

= IiZ^-Z^+N^lt) (29) 



(L). y(L) 



m>;z, 



■N, 



(30) 



A L ) — ji/'- 1 



where U- — g(V*_ L ) is the L-th order predictor output at 
time i, and Z\ is the prediction error. The first equality 
above follows since manipulating the condition does not 



affect the conditional mutual information; the second equality 



follows from the definition of Z 



W. 



follows since N is 



independent of ({U n }, V i ) and therefore 



1 + ^ 



(31) 



00 




i.o 6 ( 1 + f) 


(32) 


I(Z n ;Z n + N n ). 


(33) 



(Zr + Ni) — (Z^,V*ll) — {Un} 

form a Markov chain; and d30b follows from two facts: first, 
since Ni is independent of {Ui} and previous iVj's, it is 
also independent of the pair (Z\ , V^Zl) by the recursive 
structure of the system; second, we assume optimum (MMSE) 
prediction, hence the orthogonality principle implies that the 
prediction error Z\ ' is orthogonal to the measurements V^zl, 
so by Gaussianity they are also independent, and hence by 
the two facts we have that V?Zl is independent of the pair 
[Z\ ,Ni). Since by (l22l the variance of the i-th order 
prediction error Z\ ' is o\, while the variance of the noise 
Ni is 6, we thus obtained from (130b 

i({Un};V i \vt£) = liog( 

This implies in the limit as L 
IiWnhVilVr) -. 



Note that by stationarity, /({[/„}; Vi\Vi ) is independent of i. 
Thus, 

I({UnY, Vi) + I({U n }; V 2 \Vi) + ... + /({[/„}; KM*" 1 ) 

normalized by 1/i converges as i — > oo to I({U n }; Vi\V~). 
By the definition of mutual information rate ( f27T > and by the 
chain rule for mutual information [5], this implies that the left 
hand side of d28l l is equal to 

l({U n }; {V n }) = /({[/„}; Vi\Vr). (34) 

Combining d33l and d34l > the lemma is proved. ■ 



Theorem Q] is a simple consequence of Lemma [T] above and 
the forward channel realization of the RDF. As discussed in 
the previous section, the DPCM error identity ( f20b implies 
that the entire system of Figure Q] is equivalent to the system 
depicted in Figure |H consisting of a pre-filter ( TT3l . an AWGN 
channel with noise variance 6, and a post-filter < TT~8T >. This is 
also the forward channel realization <(3j of the RDF [8], [2], 
[18]. In particular, as simple spectral analysis shows, the power 
spectrum of the overall error process Y n — X n is equal to the 
water filling distortion spectrum D{e?' i ' K > ) in @. Hence, by 
(01 the total distortion is D, and ( |25] | follows. 

We turn to prove the second part of the theorem (equa- 
tion d2"6i i ). Since the system of Figure |4] is equivalent to the 
forward channel realization (0 of the RDF of {X n }, we have 



I({X n };{Y n }) = R(D) 



(35) 



where / denotes mutual information-rate d27] >. Since {U n } is 
a function of {X n }, and since the post-filter H2 is invertible 
within the pass-band of the pre-filter H±, we also have 



The theorem now follows by combining d36l l, ( f35l > and 
Lemma Q] 

An alternative proof of Theorem Q] based only on spectral 
considerations, is given in the end of the next section. 



IV. Properties of the Predictive Test-Channel 

The following observations shed light on the behavior of 
the test channel of Figure Q] 

Prediction in the high resolution regime. If the power- 
spectrum S(e : ' 2 ^^) is everywhere positive (e.g., if {X n } can 
be represented as an AR process), then in the limit of small 
distortion D — > 0, the pre- and post-filters (fl~3b . ( f]~8b converge 
to all-pass filters, and the power spectrum of U n becomes 
the power spectrum of the source X n . Furthermore, noisy 
prediction of U n (from the "noisy past" V~, where V n = 
U n + N n ) becomes equivalent to clean prediction of U n from 
its own past U~, Hence, in this limit the prediction error Z n 
is equivalent to the innovation process of X n ©. In particular, 
Z n is an i.i.d. process whose variance is P e (X) = the entropy- 
power of the source (0. 

Prediction in the general case. Interestingly, for general 
distortion D > 0, the prediction error Z n is not white, as the 
noisiness of the past does not allow the predictor g to remove 
all the source memory. Nevertheless, the noisy version of the 
prediction error Zq n = Z n + N n is white for every D > 0, 
because it amounts to predicting V n from its own infinite 
past: since N n has zero-mean and is white (and therefore 
independent of the past), U n that minimizes the prediction 
error of U n is also the optimal predictor for V n = U n + N n . 
In particular, in view of ([8]) and ( flOl ). we have 



Zq n ~Af(0,P e (V)) 



(37) 



where P e (V) is the entropy-power of the process V n . And 
since Zq n is the independent sum of Z n and N n , we also 
have the relation 

Pe(V)=a 2 00 + 9 

where a 2 ^ is the variance of Z n d23l and 9 is the variance of 
JV n . 

Sequential Additivity. The whiteness of Zq n might seem 
at first a contradiction, because Zq n is the sum of a non-white 
process, Z n , and a white process N n ; nevertheless, {Z n } and 
{N n } are not independent, because Z n depends on past values 
of N n through the feedback loop and the past of V n . Thus, the 
channel Zq n = Z n + N n is not quite additive but "sequentially 
additive": each new noise sample is independent of the present 
and the past but not necessarily of the future. In particular, this 
channel satisfies: 

I{Z n - Zn+N^Zi+Nx, ..., Z n _ 1 +N n _ x ) = I{Z n - Z n +N n ) , 

(38) 
so by the chain rule for mutual information 

I({Z n }; {Z„ + N n }) > I{Z n - Z n + N n ) . 



I({X n };{Y n }) =/({[/„}; {V n }). 



(36) 



Later in Section |VTJ we rewrite (138t in terms of directed mutual 
information. 



The channel when the SLB is tight. As long as D is 
smaller than the lowest point of the source power spectrum 
Smin, we have D{eP 2 ^^) = = D in (fl}, and the quadratic- 
Gaussian RDF coincides with the SLB ( fTTT ). In this case, the 
following properties hold for the predictive test channel: 

• The power spectra of U n and Y n are the same and are 
equal to S{e j27rf ) - D. 

• The power spectrum of V n is equal to the power spectrum 
of the source S{e^f). 

• The variance of Zq n is equal to the entropy-power of V n 
by d37l ). which is equal to P e {X). 

• As a consequence we have 

I{Z n -Z n + N n ) = h(Zq n )-h(N n ) 

= h(fV(0,P e (V)))-h(fV(0,D^ 

- ;*m 

which is indeed the SLB ( fTTT i. 
As discussed in the Introduction, the SLB is also the RDF of 
the innovation process ( fT2b . i.e., the conditional RDF of the 
source X n given its infinite clean past X~, 

An alternative derivation of Theorem Q] in the spectral 
domain. For a general D, we can use d37| i and the equivalent 
channel of Figure|4]to re-derive the scalar mutual information - 
RDF identity d26l i. Note that for any D the power spectrum of 
U„ and Y n is equal to max{0, S(e j2nf ) -6}, where = 0(D) 
is the water-level. Thus the power spectrum of V n — U n + N n 
is given by max{6>, S^e 32 *^)}. Since as discussed above the 
variance of Zq n = Z„ + N n is given by the entropy power of 
the process V n , we have 

'P e (max{6»,S'(e j27r/ )})\ 



X 



(i) 



I{Z n -Z n +N n ) 



R(D) 







where P e (-) as a function of the spectrum is given in (O, and 
the second equality follows from (Q]i. 



V. Vector-Quantized DPCM and D*PCM 

As mentioned earlier, the structure of the central block 
of the channel of Figure Q] is of a DPCM encoder, with 
the scalar quantizer replaced by the AWGN channel Zq n = 
Z n + N n . However, if we wish to implement the additive 
noise by a quantizer whose rate is the mutual information 
I(Z n ; Z n + N n ), we must use vector quantization (VQ). In- 
deed, while scalar quantization noise is approximately uniform 
over intervals, good high dimensional lattices generate near 
Gaussian quantization noise [17]. Yet, how can we combine 
VQ and DPCM without violating the sequential nature of 
the system? In particular, the quantized sample Zq n must be 
available to generate V n , before the system can predict U n +i 
and generate Z n+1 . 

One way we can achieve the VQ gain and still retain the 
sequential structure of the system is by adding a "spatial" 
dimension, i.e., by jointly encoding a large number of parallel 
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Fig. 5. DPCM of parallel sources. 



sources, as happens, e.g., in video coding. Figure [5] shows 
DPCM encoding of K parallel sources. The spectral shaping 
and prediction are done in the time domain for each source 
separately. Then, the resulting vector of K prediction errors 
is quantized jointly at each time instant by a vector quantizer. 
The desired properties of additive quantization error, and rate 
which is equal to K times the mutual information I(Z n ; Z n + 
N n ), can be approached in the limit of large K by a suitable 
choice of the quantizer. In the next section we discuss one 
way to do that using lattice ECDQ. 

What if we have only one source instead of K parallel 
sources? If the source has decaying memory, we can still 
approximate the parallel source coding approach above, at 
the cost of large delay, by using interleaving. We divide the 
(pre-filtered) source into K long blocks, which are separately 
predicted and then interleaved and jointly quantized as if they 
were parallel sources. See Figure ??. This is analogous to 
the method used in [11] for combining coding-decoding and 
decision-feedback equalization (DFE). 
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Fig. 6. VQ-DPCM for a single source using interleaving. (II and II 1 
denote interleaving and de-interleaving, respectively.) 

If we do not use any of the above, but restrict our- 
selves to scalar quantization (K = 1), then we have a 
pre/post filtered DPCM scheme. By combining Theorem Q] 
with known bounds on the performance of (memoryless) 
entropy-constrained scalar quantizers (e.g., [18]), we have 

H(Q^(Z n )) < R(D) + \ log(^) (39) 

where 1/2 log(27re/12) w 0.254 bit. See Remark 3 in the 
next section regarding scalar/lattice ECDQ. Hence, TheoremQ] 
implies that in principle, a pre/post filtered DPCM scheme is 
optimal, up to the loss of the VQ gain, at all distortion levels 
and not only at the high resolution regime. 

A different approach to combine VQ and prediction is 
first to extract the innovation process and then to quantize 
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it. It is interesting to mention that this method of "open 
loop" prediction, which we mentioned earlier regarding the 
model of [13], is known in the quantization literature as 
D*PCM [12]. The best pre-filter for D*PCM under a high 
resolution assumption turns out to be the "half-whitening 
filter": |i?i( e J 27r ^)| 2 = l/y/S{e^f), with the post filter 
iy 2 (e-' 27r ^) being its inverse. But even with this optimum filter, 
D*PCM is inferior to DPCM: The optimal distortion gain of 
D*PCM over a non-predictive scheme is 
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D*PCM 
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jt{%VSx(e^f)df 



(strictly greater than one for non-white spectra by the Cauchy- 
Schwartz inequality). Comparing to the optimum prediction 
gain obtained by the DPCM scheme: 

G x 



we have: 
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where U n is the pre-filter output in the D*PCM scheme. This 
ratio is strictly greater than one for non-white spectra. 



VI. ECDQ in a Closed Loop System 

Subtractive dithering of a uniform/lattice quantizer is a 
common approach to make the quantization noise additive. 
As shown in [16], the conditional entropy of the dithered 
lattice quantizer (given the dither) is equal to the mutual 
information in an additive noise channel, where the noise is 
uniform over the lattice cell. Furthermore, for "good" high 
dimensional lattices, the noise becomes closer to a white 
Gaussian process [17]. Thus, ECDQ (entropy-coded dithered 
quantization) provides a natural way to realize the inner 
AWGN channel block of the predictive test-channel. 

One difficulty, however, we observe in this section is that 
the results developed in [16] do not apply to the case where 
the ECDQ input depends on previous EDCQ outputs and the 
entropy coding is conditioned on the past. This situation indeed 
happens in predictive coding, when ECDQ is embedded within 
a feedback loop. As we shall see, the right measure in this case 
is the directed information. 



An ECDQ operating on the source Z n is depicted in Figure 
[7] A dither sequence D n , independent of the input sequence 
Z n , is added before the quantization and subtracted after. If 
the quantizer has a lattice structure of dimension K > 1, then 
we assume that the sequence length is 

L = MK 

for some integer M, so the quantizer is activated M times. In 
this section, we use bold notation for if-blocks corresponding 
to a single quantizer operation. At each quantizer operation 
instant m, a dither vector D m is independently and uniformly 
distributed over the basic lattice cell. The lattice points at the 
quantizer output Q TO , m = 1, . . . , M are fed into an entropy 
coder which is allowed to jointly encode the sequence, and has 
knowledge of the dither as well, thus for an input sequence of 
length L it achieves an average rate of: 



R E cD Q = ^H(Q:r\D;> 



\M\- rt M\ 



(40) 



bit per source sample. The entropy coder produces a sequence 
s of \LRecdq] bits, from which the decoder can recover 
Qi,...Qm, and then subtract the dither to obtain the re- 
construction sequence Zq n — Q n — D ni n = 1, . . . L. The 
reconstruction error sequence 

N n = Zq n - Z n , 

called in the sequel the "ECDQ noise", has K -blocks which 
are uniformly distributed over the mirror image of the basic 
lattice cell and are mutually i.i.d. [16]. It is further stated in 
[16, Thm.l] that the input and the noise sequences, Z = Z± 
and N = TVf, are statistically independent, and that the ECDQ 
rate is equal to the mutual information over an additive noise 
channel with the input Z and the noise N: 

Recdq — — /(Z;Zq) 



L 
= ~I(Z;Z + N) 



(41) 



However, the derivation of [16, Thm. 1] makes the implicit 
assumption that the quantizer is used without feedback, that is, 
the current input is conditionally independent of past outputs 
given the past inputs. (In other words, the dependence on the 
past, if exists, is only due to memory in the source.) When 
there is feedback, this condition does not necessarily hold, 
which implies that (even with the dither) the sequences Z 
and N are possibly dependent. Specifically, since feedback is 
causal, the input Z n can depend on past values of the ECDQ 
noise N n , so their joint distribution in general has the form: 



M 



f(Zt,N£)= TT f(N m )f(Z m \N™- 1 ) 



(42) 



where 



m— 1 



rj lymK 



denotes the rath Jf-block, and similarly for N m . In this case, 
the mutual information rate of < f4TT> over-estimates the true rate 
of the ECDQ. 

Massey shows in [15] that for DMCs with feedback, tradi- 
tional mutual information is not a suitable measure, and should 



be replaced by directed information. The directed information 
between the sequences Z and Zq = Zq\ is defined as 

L 

/(Z^Zq) = ^/(Z^Z^I^- 1 ) (43) 

L 



where the second equality holds whenever the channel from 
Z n to Zq n is memoryless, as in our case. In contrast, the 
mutual information between Z and Zq is given by, 



I(Z;Z< l ) = ^ri(Zt;Z qn \Z q r 1 ) 



(44) 



which by the chain rule for mutual information is in general 
higher. For our purposes, we will define the if-block directed 
information: 



M 



i K (z - z q ) ^ £ J ( z "; z <U z «r _1 ) (45) 

m— 1 

The following result, proven in Appendix A, extends Massey's 
observation to ECDQ with feedback, and generalizes the result 
of [16, Thm. 1]: 



Theorem 2: (ECDQ Rate with Feedback) The ECDQ 

system with causal feedback defined by d42b satisfies: 

R E CDQ = 2 I K(Z^Zq) = jI K (Z^Z + N) . (46) 



Remarks: 

1. When there is no feedback, the past and future input 
blocks (Z" l_1 , Z^f +1 ) are conditionally independent of the 
current output block Zq m given the current input block Z m , 
implying by the chain rule that d43l coincides with d44b . and 
Theorem |2] reduces to [16, Thm. 1], 

2. Even for scalar quantization (K = 1), the ECDQ rate 
( |4Qb refers to joint entropy coding of the whole input vector. 
This does not contradict the sequential nature of the system 
since entropy coding can be implemented causally. Indeed, it 
follows from the chain rule for entropy that it is enough to 
encode the instantaneous quantizer output Q m conditioned on 
past quantizer outputs Q" l_1 and on past and present dither 
samples D™, in order to achieve the joint entropy of the 
quantizer in (l40b . 

3. If we don't condition the entropy coding on the past, then 
we have 

Recdq = I{Z n -Z n + N^ mSorm) ) (47) 

< I(Z n ; Z n + N^ ss )) + l - log(^) (48) 



^ + 2 l0g (l2-) 



(49) 



where jyv um J° rm J j the scalar quantization noise, is uniformly 



d4"9l follows from TheoremQ] This implies d39l in the previous 
section. 

4. We can embed a K -dimensional lattice ECDQ for K > 1 
in the predictive test channel of Figure Q] instead of the 
additive noise channel, using the Vector-DPCM (VDPCM) 
configuration discussed in the previous section. For good 
lattices, when the quantizer dimension K — > oo, the noise 
process N in the rate expressions (l4lT i and d46b becomes white 
Gaussian, and the scheme achieves the rate-distortion function. 
Indeed, combining Theorems Q] and |2j we see that the average 
rate per sample of such VDPCM with ECDQ satisfies: 

Rvdpcm-ecdq = I{Z n \ Z n + N n ) = R(D) . 

This implies, in particular, that the entropy coder does not need 
to be conditioned on the past at all, as the predictor handles 
all the memory. However, when the quantization noise is not 
Gaussian, or the predictor is not optimal, the entropy coder can 
use the residual time-dependence after prediction to further 
reduce the coding rate. The resulting rate of the ECDQ would 
be the average directed information between the source and 
its reconstruction as stated in Theorem [2] 



VII. A Dual Relationship with Decision-Feedback 
Equalization 

In this section we make an analogy between the predictive 
form of the Gaussian RDF and the "information-optimality" 
of decision-feedback equalization (DFE) for colored Gaussian 
channels. As we shall see, a symmetric equivalent form of 
this channel coding problem, including a water-pouring trans- 
mission filter, an MMSE receive filter and a noise prediction 
feedback loop, exhibits a striking resemblance to the pre/post- 
filtered predictive test-channel of Figure Q] 

We consider the (real-valued) discrete-time time-invariant 
linear Gaussian channel, 



H n — On * JL n + Z ni 



(50) 



where the transmitted signal S n is subject to a power constraint 
-E'I'S'n] < P> the channel dispersion is modeled by a linear 
time-invariant filter c„, and where the channel noise Z n is 
(possibly colored) Gaussian noise. 

Let U n represent the data stream which we model as an 
i.i.d. zero-mean Gaussian random process with variance afj. 
Further, let h\ >n be a spectral shaping filter, satisfying 

4 /' \H x {e^f)\Hf<P (51) 

J -1/2 

so the channel input X n — hi in *U n indeed satisfies the power 
constraint. For the moment, we make no further assumption 
on h n . 

The channel ( 1521 has inter-symbol interference (ISI) due to 
the channel filter c n , as well as colored Gaussian noise. Let 
us assume that the channel frequency response is non-zero 
everywhere, and pass the received signal R n through a zero- 



distributed over the interval (-\/l2D,+\/l2D), and where forcing (ZF) linear equalizer ^y, resulting in Y n . We thus 
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Fig. 8. MMSE-DFE Scheme in Predictive Form 
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arrive at an equivalent ISI-free channel, 
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N n 



(52) 



where the power spectrum of N n is 



S N {e j2 * f ) 



\C{e^f)\ 2 ' 

The mutual information rate (normalized per symbol) d27| i 
between the input and output of the channel d52l is 

Sx(e^ f Y 

It )<> I I -:- 



I({X n },{Y n }) = 



1/2 1 

o lo g 

1/2 



5iv(e^-/) 



d/. (53) 



We note that if the spectral shaping filter h n satisfies the 
optimum "water-filling" power allocation condition, [5], then 
d53l will equal the channel capacity. 

Similarly to the observations made in Section Q] with respect 
to the RDF, we note (as reflected in d53"b ) that capacity 
may be achieved by parallel AWGN coding over narrow 
frequency bands (as done in practice in Discrete Multitone 
(DMT)/Orthogonal Frequency-Division Multiplexing (OFDM) 
systems). An alternative approach, based on time-domain 
prediction rather than the Fourier transform, is offered by 
the canonical MMSE - feed forward equalizer - decision 
feedback equalizer (FFE-DFE) structure used in single-carrier 
transmission. It is well known that this scheme, coupled with 
AWGN coding, can achieve the capacity of linear Gaussian 
channels. This has been shown using different approaches by 
numerous authors; see [11], [4], [1], [7] and references therein. 
Our exposition closely follows that of Forney [7]. We now 
recount this result, based on linear prediction of the error 
sequence; see the system in Figure[S]and its equivalent channel 
in Figure [9] In the communication literature, this structure is 
referred to as "noise prediction". It can be recast into the more 
familiar FFE-DFE form by absorbing a part of the predictor 
into the estimator filter, forming the usual FFE. 

As a first step, let U n be the optimal MMSE estimator 
of U n from the equivalent channel output sequence {Y n } of 
(152) . Since {[/„} and {Y n } are jointly Gaussian and stationary 
this estimator is linear and time invariant. Note that the 
combination of the ZF equalizer ^l^y at the receiver front- 
end and the estimator above is equivalent to direct MMSE 
estimation of U n from the original channel output R n ( T50b . 
Denote the estimation error, which is composed in general of 
ISI and Gaussian noise, by D n . Then 



where {D n } is statistically independent of {U n } due to the 
orthogonality principle and Gaussianity. 

Assuming the decoder has access to past symbols U~ = 
U n -i, U n -2, ■ ■ ■ (see in the sequel), the decoder knows also 
the past estimation errors D~ = D„_i,_D n _2, . . . and may 
form an optimal linear predictor, D n , of the current estimation 
error D n , which may then be added to U n to form V n - The 
prediction error E„ — D n — D„ has variance P e (D), the 
entropy power of D n . It follows that 



U„ = U n 
= V n - 



-D ri 



D n 



V n + E„ 



and therefore 



E{U n - V n } 2 =a% = E{D n - D n } 2 = P e (D). 



(55) 



(56) 



The channel (155b . which describes the input/output relation 
of the sheer in Figure [8] is often referred to as the backward 
channel. Furthermore, since U n and E n are i.i.d Gaussian 
and since by the orthogonality principle E„ is independent of 
present and past values of V n (but dependent of future values 
through the feedback loop), it is a "sequentially additive" 
AWGN channel. See Figure [10] for a geometric view of these 
properties. Notice the strong resemblance with the channel 
([Tol l. Zq n = Z n + N n , in the predictive test-channel of the 
RDF: in both channels the output and the noise are i.i.d. and 
Gaussian, but the input has memory and it depends on past 
outputs via the feedback loop. 
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Fig. 10. Geometric View of the Estimation Process 

We have therefore derived the following. 



Theorem 3: (Information Optimality of Noise Predic- 
tion) For stationary Gaussian processes U n and N n , and if 
H2(e : > 2lT f) is chosen to be the optimal estimation filter of U n 
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from Y n and the predictor <?(•) is chosen to be the optimal 
prediction filter of D n (with L — ► oo), then the mutual 
information-rate d53l l of the channel from X n to Y n (or from 
[/„ to Y n ) is equal to the scalar mutual information 

I(V n ;V n + E n ) 

of the channel d55l ). Furthermore, if H\(e?^*) is chosen 
such that Sx(e j2 " r *) equals the water- filling spectrum of the 
channel input, then this mutual information equals the channel 
capacity. 



Proq/: Let U~ = {J7 n _i, J7 n _ 2 , • • ■} and D" = 
{Z?„_i, D„_2, • ■ ■}■ Using the chain rule of mutual informa- 
tion we have 

/({£/„}, {F„}) = M{tM)-M{tM|{r„}) 

= h({U n })-h(U n \{Y n },U-) 

= h({U n }) - h(U n - U„\{Y n }, U~) 

= M{(7„})-/i(^ n |{r n },c/-) 

= h({U n })-h(D n \{Y n },D-) 

= M{(7„}) - h(D n - D n \{Y n }, D~) 

= h({U n })-h(E n \{Y n },D-) 

= h({U n }) - h(E n ) (57) 

= I(V n ;V n + E n ), 

where h(-) denotes differential entropy rate, and where $5% 
follows from successive application of the orthogonality prin- 
ciple [7], since we assumed optimum estimation and prediction 
filters, which are MMSE estimators in the Gaussian setting. 



In view of d53j and (J56J, and since I({U n },{Y„}) = 
I({X n },{Y n }), Theorem[3]can be re-written as 
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d/ = llog(^f 



(58) 



from which we obtain the following well known formula for 
the "SNR at the sheer" for infinite order FFE-DFE, [4], [1], 

^2 / ,1/2 



— S- = exp 
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S N {eP"f) 



df 



We make a few remarks and interpretations regarding the 
capacity-achieving predictive configuration, which further en- 
hance its duality relationship with the predictive realization of 
the RDF. 



Slicing and Coding We assumed that the decoder has 
access to past symbols. In the simplest realization, this is 
achieved by a decision element ("sheer") that works on a 
symbol-by-symbol basis. In practice however, to approach 
capacity, the slicer must be replaced by a "decoder". Here 
we must actually break with the assumption that X n is a 
Gaussian process. We implicitly assume that X n are symbols 
of a capacity- achieving AWGN code. The slicer should be 
viewed as a mnemonic aid where in practice an optimal 
decoder should be used. 

However, we encounter two problems with this interpre- 
tation. First, the common view of a slicer is as a nearest 
neighbor quantizer. Thus in order to function correctly, the 
noise E n in OBI must be independent of the symbols U n and 
not of the estimator V n (i.e., the channel should be "forward" 
additive: V n — U n + E n ). This can be achieved by dithering 
the codebook via a modulo-shift as in [6], This is reminiscent 
to the dithered quantization approach of Section [VT] Another 
difficulty is the conflict between the inherent decoding delay 
of a good code, and the sequential nature of the noise- 
prediction DFE configuration. Again (as with vector-DPCM in 
Section fVll . this may in principle be solved by incorporating 
an interleaver as suggested by Guess and Varanasi [11]. 

Capacity achieving shaping filter. For any spectral shaping 
filter hi <n , the mutual information is given by ( f53l l. The 
shaping filter h n which maximizes the mutual information 
(and yields capacity) under the power constraint ( f5Tb is given 
by the parametric water-filling formula: 



r\H 1 (e^)\ 2 = [6-S N (e^)}+, 



(59) 



where the "water level" 9 is chosen so that the power constraint 
is met with equality, 

r-l/2 



•X 



J -1/2 

= [ \9-S N {eP^ f )Ydf = P. (60) 

J-l/2 

Using this choice, and arbitrarily setting 

of, - e (6i) 

it can be verified that the shaping filter H\ {e?' 1 ' K * ) and the 
estimation filter Hi(e? 2 '*') satisfy the same complex conjugate 
relation as the RDF-achieving pre- and post-filters < fT~3T > and 

H 2 (e j2 * f ) = H*(e j27Tf ). 



11 



Under the same choice, we also have that: 



S D (e j2wf ) = win{S N (e j2wf ),e} 



(62) 



Shaping, estimation and prediction at high SNR. At 

high signal-to-noise ratio (SNR), the shaping filter Hi and 
the estimation filter H2 become all-pass, and can be replaced 
by scalar multipliers. If we set the symbol variance as in ( f6Tb . 
then we get at high SNR afj sa P, so X n sa U n and U n fa Y n . 
It follows that the estimation error D„ s=s N n , and therefore 
the slicer error E n becomes simply the prediction error (or 
the entropy power) of the channel noise N n . This is the well 
known "zero-forcing DFE" solution for optimum detection at 
high SNR [1]. We shall next see that the same behavior of the 
slicer error holds even for non-asymptotic conditions. 

The prediction process when the Shannon upper bound 
is tight. The Shannon upper bound (SUB) on capacity states 
that 
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< 
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log(27rec7y) - h(N) 



\ log 



'N 
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where 



'N 
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is the variance of the equivalent noise, and where equality 
holds if and only if the output Y n is white. This in turn is 
satisfied if and only if 



>max,We j27r/ ), 
/ 



in which case 9 = P + a 



N- 



If we choose u\j according to doTT ), we have: 
• The shaping and estimation filters satisfy 

j2ir/\|2 _ m /„.,-2tt/m2 _ 1 S N{e 



\H x {e 



H 2 {e?^)\ A = \- 



U n and Y n are white, with the same variance 9. 

X n and U n have the same power spectrum, 

9 - S N {e^f). 

The power spectrum of D n is equal to the power spectrum 

of the noise N n , S , Ar(e : ' 27r ^). Consequently, the variance 

of E n which is equal to the entropy-power of D n , is 

equal to P e (N). 

As a consequence we have 



I(V n ;V n + E n ) = 



h{U n ) - h(E n 

h(Ar(o,8J)-h[.\i'u).rjX)) 

p_L ^2 
lOffl 

2 



2 H PJN) J 



Pe(N) 

which is indeed the SUB d63l . 

An alternative derivation of Theorem [3] in the spectral 
domain. Similarly to the alternative proof of Theorem [T] one 
can prove Theorem [3] using the spectra derived above. 



VIII. Summary 

We demonstrated the dual role of prediction in rate- 
distortion theory of Gaussian sources and capacity of ISI 
channels. These observations shed light on the configurations 
of DPCM (for source compression) and FFE-DFE (for channel 
demodulation), and show that in principle they are "informa- 
tion lossless" for any distortion / SNR level. The theoretic 
bounds, RDF and capacity, can be approached in practice by 
appropriate use of feedback and linear estimation in the time 
domain combined with coding across the "spatial" domain. 

A prediction-based system has in many cases a delay lower 
than that of a frequency domain approach, as is well known in 
practice. We slightly touched on this issue when discussing the 
0.5 bit loss due to avoiding the ("non-causal") pre/post filters. 
But the full potential of this aspect requires further study. 

It is tempting to ask whether the predictive form of the 
RDF can be extended to more general sources and distortion 
measures (and similarly for capacity of more general ISI 
channels). Yet, examination of the arguments in our derivation 
reveals that it is strongly tied to the quadratic-Gaussian case: 

• The orthogonality principle, implied by the MMSE cri- 
terion, guarantees the whiteness of the noisy prediction 
error Zq n and its orthogonality with the past. 

• Gaussianity implies that orthogonality is equivalent to 
statistical independence. 

For other error criteria and/or non-Gaussian sources, prediction 
(either linear or non-linear) is in general unable to remove the 
dependence on the past. Hence the scalar mutual information 
over the prediction error channel would in general be greater 
than the mutual information rate of the source before predic- 
tion. 
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Appendix 

A. Proof of Theorem[2] 

It will be convenient to look at if-blocks, which we denote 
by bold letters as in Section [VI] Substituting the ECDQ rate 
definition d40b and the A'-block directed information definition 
d45b . the required result d46b becomes: 
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ff(Qf |Df ) = £ 7(Z™; ZqjZqZ' 1 ) . 



Using the chain rule for entropies, it is enough to show that: 
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To that end, we have the following sequence of equalities: 
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7(Zg m ; ZflZg™- 1 ) - 7(D ro ; Z^^ 1 ) 
/(Z^^riZg™- 1 ) . 



In this sequence, equality (a) comes from the independent 
dither generation and causality of feedback, (b) is justified 
because Q m is a deterministic function of the elements on 
which the subtracted entropy is conditioned, thus entropy 
is 0. In (c) we subtract from the left hand side argument 
of the mutual information one of the variables upon which 
mutual information is conditioned, (d) and (e) hold since 
each dither vector D m is a deterministic function of the 
corresponding quantizer output Zq m . Finally, (f) is true since 
Z™ is independent of D r „ (both conditioned on past quantized 
values and unconditioned). 



12 



[14] T. Linder and R. Zamir, Causal coding of stationary sources and 
individual sequences with high hesolution. IEEE Trans. Information 
Theory, IT-52:662-680, Feb. 2006. 

[15] J. Massey. Causality, Feedback and Directed Information. In Proc. IEEE 
Int. Symp. on Information Theory, pages 303-305, 1990. 

[16] R. Zamir and M. Feder. On universal quantization by randomized 
uniform / lattice quantizer. IEEE Trans. Information Theory, pages 428- 
436, March 1992. 

[17] R. Zamir and M. Feder. On lattice quantization noise. IEEE Trans. 
Information Theory, pages 1152-1159, July 1996. 

[18] R. Zamir and M. Feder. Information rates of pre/post filtered dithered 
quantizers. IEEE Trans. Information Theory, pages 1340-1353, Septem- 
ber 1996. 



References 

[1] J. Barry, E. A. Lee and D. G. Messerschmitt. Digital Communication. 

Kluwer Academic Press, 2004 (third edition). 
[2] T. Berger. Rate Distortion Theory: A Mathematical Basis for Data 

Compression. Prentice-Hall, Englewood Cliffs, NJ, 1971. 
[3] J. Chen, C. Tian, T. Berger, and S. S. Hemami. Multiple Description 

Quantization via Gram-Schmidt Orthogonalization. IEEE Trans. Infor- 
mation Theory, IT-52:5197-5217, Dec. 2006. 
[4] J.M. Cioffi, G.P. Dudevoir, M.V. Eyuboglu, and GD. J. Forney. MMSE 

Decision-Feedback Equalizers and Coding - Part I: Equalization Results. 

IEEE Trans. Communications, COM-43:2582-2594, Oct. 1995. 
[5] T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley, 

New York, 1991. 
[6] U. Erez and R. Zamir. Achieving \ log(l + SNR) on the AWGN 

channel with lattice encoding and decoding. IEEE Trans. Information 

Theory, IT-50:2293-2314, Oct. 2004. 
[7] G D. Forney, Jr. Shannon meets Wiener II: On MMSE estimation in 

successive decoding schemes. In 42st Annual Allerton Conference on 

Communication, Control, and Computing, Allerton House, Monticello, 

Illinois, Oct. 2004. 
[8] R. G. Gallager. Information Theory and Reliable Communication. Wiley, 

New York, N.Y, 1968. 
[9] A. Gersho and R. M. Gray. Vector Quantization and Signal Compression 

Kluwer Academic Pub., Boston, 1992. 
[10] G.D.Gibson, T.Berger, T.Lookabaugh, D.Lindbergh, and R.L.Baker. 

Digital Compression for Multimedia: Principles and Standards' . Morgan 

Kaufmann Pub., San Fansisco, 1998. 
[11] T. Guess and M. K. Varanasi. An information-theoretic framework 

for deriving canonical decision-feedback receivers in gaussian channels. 

IEEE Trans. Information Theory, IT-5L173-187, Jan. 2005. 
[12] N. S. Jayant and P. Noll. Digital Coding of Waveforms. Prentice-Hall, 

Englewood Cliffs, NJ, 1984. 
[13] K. T. Kim and T. Berger. Sending a Lossy Version of the Innovations 

Process is Suboptimal in QG Rate-Distortion. In Proceedings of ISIT- 

2005, Adelaide, Australia, pages 209-213, 2005. 



